The techniques described in this chapter for summarizing, graphing, and comparing survival data deal
with the time interval from a defined starting point to the first occurrence of an endpoint event. The
event can be designated as death or a relapse of a particular condition, such as a recurrence of cancer.
Or you could designate the event to be surgical removal (called an explant) of a failed mechanical
component, such as an artificial heart valve. If a patient’s heart valve was implanted on January 10
(beginning of time interval), but their body rejected it and the explant took place on January 30 (time of
event), then the time interval from implant to explant is 30 – 10, or 20 days.
A person can die only once, so survival analysis can obviously be used for one-time events. But other
endpoints can occur multiple times, such as having a stroke or having cancer go into remission. The
techniques we describe in this chapter only analyze time to the first occurrence of the event. More
advanced survival analysis methods are needed for models that can handle multiple occurrences of an
event, and these are beyond the scope of this book.
The starting point of the time interval is somewhat arbitrary, so it must be defined explicitly
every time you do a survival analysis. Imagine that you’re studying the progression of chronic
obstructive pulmonary disease (COPD) in a group of patients. If you want to study the natural
history of the disease, the starting point can be the diagnosis date. But if you’re instead interested
in evaluating the efficacy of a treatment, the starting point can be defined as the date the treatment
began.
Recognizing that survival times aren’t normally distributed
Even though survival times are numerical quantities, they’re almost never normally
distributed. Because of this, it’s generally not a good idea to use the following:
Means and standard deviations to describe survival times
T tests and ANOVAs to compare survival times between groups
Least-squares regression to investigate how survival time is influenced by other factors
If non-normality were the only problem with survival data, you’d be able to summarize survival times
as medians and centiles instead of means and standard deviations. Also, you could compare survival
between groups with nonparametric Mann-Whitney and Kruskal-Wallis tests instead of t tests and
ANOVAs. But time-to-event data are susceptible to a specific type of missingness called censoring.
Typical parametric and nonparametric regression methods are not equipped to deal with censoring, so
we present survival analysis techniques in this chapter.
Considering censoring
Survival data are defined as the time interval between a selected starting point and an endpoint that
represents an event. But unfortunately, the time the event takes place can be missing in survival data.
This can happen in two general ways: